32 research outputs found
Recommended from our members
Finding New Rules for Incomplete Theories: Induction with Explicit Biases in Varying Contexts
Many AI problem solvers possess explicitly encoded knowledge - a domain theory ““ that they use to solve problems. If these problem solvers are to be autonomous, they must be able to detect and to fill gaps in their own knowledge. The field of machine learning addresses this issue. Recently two disparate machine learning approaches have emerged as predominant in the field: explanation-based learning (EBL) and similarity-based learning (SBL), EBL and SBL have been applied to problems in a variety of domains. Both methods have clear problems, however, EBL assumes that a system is given an explicit theory of the domain that is complete, correct, and tractable. These assumptions are clearly unrealistic for most complex, real-world problems. SBL suffers because of its lack of an explicit theory of the domain. The simplicity of the method requires that human intervention playa large role in tailoring input examples and the features describing them in such a way as to allow a system to choose an appropriate set of features to define a concept. Biasing a system in this way may result in its being unable to discover all concepts in even a Single domain. Less tailoring of the examples leaves a system open to the possibility of not converging on the best definition for a concept, or any at all, due to the computational complexity. The research described in this proposal addresses a number of the problems found in explanation-based and similarity-based learning. The major focus of the research is the elimination of the assumption that the domain theory of an EBL system is complete. In particular, it considers the problem of working with an incomplete theory by suggesting a method by which gaps in an EBL system's knowledge can be detected and filled. We suggest that when EBL cannot derive a complete explanation, the partial explanation focus a context in which learning takes place. Information extracted from partial explanations, as well as from complete explanations, can be exploited by SBL to do better induction of the missing domain knowledge. The extracted information constitutes an explicit bias for similarity-based learning. A second problem to be addressed is that of making the biases of SBL explicit. Finally, all testing of the claims made in this proposal is to be done in the Gemini learning system. The development of the system addresses the goal of constructing an integrated learning architecture utilizing both EBL and SBL
Computing Competencies for Undergraduate Data Science Curricula: ACM Data Science Task Force
At the August 2017 ACM Education Council meeting, a task force was formed to explore a process to add to the broad, interdisciplinary conversation on data science, with an articulation of the role of computing discipline-specific contributions to this emerging field. Specifically, the task force would seek to define what the computing/computational contributions are to this new field, and provide guidance on computing-specific competencies in data science for departments offering such programs of study at the undergraduate level.
There are many stakeholders in the discussion of data science – these include colleges and universities that (hope to) offer data science programs, employers who hope to hire a workforce with knowledge and experience in data science, as well as individuals and professional societies representing the fields of computing, statistics, machine learning, computational biology, computational social sciences, digital humanities, and others. There is a shared desire to form a broad interdisciplinary definition of data science and to develop curriculum guidance for degree programs in data science.
This volume builds upon the important work of other groups who have published guidelines for data science education. There is a need to acknowledge the definition and description of the individual contributions to this interdisciplinary field. For instance, those interested in the business context for these concepts generally use the term “analytics”; in some cases, the abbreviation DSA appears, meaning Data Science and Analytics.
This volume is the third draft articulation of computing-focused competencies for data science. It recognizes the inherent interdisciplinarity of data science and situates computing-specific competencies within the broader interdisciplinary space
Recommended from our members
A Survey of Machine Learning Systems Integrating Explanation-Based and Similarity-Based Methods
Two disparate machine learning approaches have received considerable attention. These are explanation-based and similarity-based learning. The basic goal of an explanation-based learning system is to more efficiently recognize concepts that it is already capable of recognizing. The learning process involves a knowledge-intensive analysis of an environment-provided example of a concept in order to extract its characteristic features. The basic goal of a similarity-based system, on the other hand, is to acquire descriptions that allow the system to recognize concepts it does not yet know. Although they have been applied with some success to problems in a variety of domains, both methods have clear deficiencies. Explanation-based learning assumes that a system will be provided with an explicit domain theory that is complete, correct, and tractable. This assumption is unrealistic for many complex, real-world domains. Similarity-based learning suffers because of its lack of an explicit theory. Since the two methods are complementary in nature, an obvious solution is to augment systems using one approach with techniques from the other. This survey discusses machine learning systems that integrate explanation-based and similarity-based learning methods such that one is incorporated primarily to handle a deficiency of the other. Although sufficient background material is provided that the reader need not be familiar with machine learning, general knowledge of AI is assumed
Extraction and Use of Contextual Attributes for Theory Completion: An Integration of Explanation-Based and Similarity-Based Learning
Extraction and Use of Contextual Attributes for Theory Completion: An Integration of Explanation-Based and Similarity-Based Learning Andrea Pohoreckyj Danyluk This research investigates the use of contextual cues to address problems in machine learning that arise from assumptions about the initial knowledge that is necessary for the acquisition of new information. Machine learning approaches may be placed along a spectrum describing purely inductive to purely deductive techniques. Inductive systems possess essentially no explicit knowledge that can be used in acquiring new facts, while deductive systems are assumed to contain a complete theory of the domain. Most work in machine learning has concentrated on approaches at the two ends of the spectrum. This dissertation describes an approach that integrates inductive and deductive methods. It provides a mechanism by which induction can be used in order to detect and acquire knowledge missing from the domain theory of a deductive sys..
A Comparison of Data Sources for Machine Learning in a Telephone Trouble Screening Expert System
This paper describes a domain where the application of machine learning, specifically inductive learning, could have enormous positive impact. The domain possesses attributes that would indicate that inductive learning would easily succeed for this domain. In particular, data for this domain are abundant. In spite of this, numerous machine learning methods -- both inductive and otherwise -- have failed to learn a knowledge base having high accuracy. This paper presents a comparison of the data sources available for this domain. It focuses primarily on a survey system that was ultimately designed for the purpose of collecting data best suited to this task. Keywords: knowledge acquisition for expert systems; knowledge elicitation; data collection; data collection interfaces This research was performed while the author was an employee of NYNEX Science and Technology, Inc. 1 Introduction Many machine learning techniques, most notably inductive methods, rely upon data from which they ..
Artificial Intelligence Competencies for Data Science Undergraduate Curricula
In August 2017, the ACM Education Council initiated a task force to add to the broad, interdisciplinary conversation on data science, with an articulation of the role of computing discipline-specific contributions to this emerging field. Specifically, the task force is seeking to define what the computing contributions are to this new field, in order to provide guidance for computer science or similar departments offering data science programs of study at the undergraduate level. The ACM Data Science Task Force has completed the initial draft of a curricular report. The computing-knowledge areas identified in the report are drawn from across computing disciplines and include several sub-areas of AI. This short paper describes the overall project, highlights AI-relevant areas, and seeks to open a dialog about the AI competencies that are to be considered central to a data science undergraduate curriculum
Off-Topic Detection in Conversational Telephone Speech
In a context where information retrieval is extended to spoken “documents ” including conversations, it will be important to provide users with the ability to seek informational content, rather than socially motivated small talk that appears in many conversational sources. In this paper we present a preliminary study aimed at automatically identifying “irrelevance ” in the domain of telephone conversations. We apply a standard machine learning algorithm to build a classifier that detects offtopic sections with better-than-chance accuracy and that begins to provide insight into the relative importance of features for identifying utterances as on topic or not.
Problem Definition, Data Cleaning, and Evaluation: A Classifier Learning Case Study
This paper is a case study of this process based on a long-term project addressing the automatic dispatch of technicians to fix faults in the local loop of a telephone network. The bottom line of the project is that simple learning techniques can be effective. However, constructing a convincing argument to that effect is far from simple. In particular, we had to consult multiple sources to obtain class labels, use domain knowledge to clean up data, compare with existing methods, and evaluate with data from multiple locations. Finally, it was necessary to use decision-analytic techniques to evaluate the cost-effectiveness of the learned classifiers, because evaluation based on classification accuracy is misleading without an analysis of cost-effectiveness. Our view is that application studies should be helpful in guiding future research. Therefore, we conclude by outlining useful directions suggested by our experience on this long-term project. 1 Introductio
Telecommunications Network Diagnosis
The Scrubber 3 system monitors problems in the local loop of the telephone network, making automated decisions on tens of millions of cases a year, many of which lead to automated actions. Scrubber saves Bell Atlantic millions of dollars annually, by reducing the number of inappropriate technician dispatches. Scrubber's core knowledge base, the Trouble Isolation Module (TIM), is a probability estimation tree constructed via several data mining processes. TIM currently is deployed in the Delphi system, which serves knowledge to multiple applications. As compared to previous approaches, TIM is more general, more robust, and easier to update when the network or user requirements change. Under certain circumstances it also provides better classifications. In fact, TIM's knowledge is general enough that it now serves a second deployed application. One of the most interesting aspects of the construction of TIM is that data mining was used not only in the traditional sense, namely, building a model from a warehouse of actual historical cases. Data mining also was used to produce an understandable model of the knowledge contained in an earlier, successful diagnostic system.NYU, Stern School of Business, IOMS Department, Center for Digital Economy Researc